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Abstract The bijection between composition structures and random closed subsets of the unit interval 
implies that the composition structures associated with SCi [0, 1] for a self-similar random set S C K + are 
those which are consistent with respect to a simple truncation operation. Using the standard coding of 
compositions by finite strings of binary digits starting with a 1, the random composition of n is defined 
by the first n terms of a random binary sequence of infinite length. The locations of Is in the sequence 
are the places visited by an increasing time-homogeneous Markov chain on the positive integers if and 
only if S = exp(— W) for some stationary regenerative random subset W of the real line. Complementing 
our study in previous papers, we identify self-similar Markovian composition structures associated with 
the two-parameter family of partition structures. 

1 Introduction 

A composition of n is a sequence A = (Ai, . . . , A^) of some number I of positive integer parts \i with 
Yfi=i — n - We may regard A as a distribution of n identical balls in a row of £ boxes, with A^ the 
number of balls in the zth box from the left end of the row. Thus the composition (2, 4, 1, 2) of 9 may be 
represented in balls-in-boxes notation as 

(2,4,1,2)^ [00] [0000] [0] [00] (1) 

or recoded in binary notation by replacing each "[0" in the balls-in-boxes notation by 1 and ignoring each 
"]", to obtain in this example 

(2,4,1,2) <-» 101000110. 

In general, the first digit in the binary notation of a composition must be a 1, but the remaining digits 
can be chosen freely, so there are 2 n_1 different compositions of n. Two other notations will be useful. 
We write X~ for the reversal of A and A^ for the decreasing rearrangement of A, also called the partition 
derived from A. For instance 

(2,4,1,2)"" = (2,1,4,2) <-> [00] [0] [0000] [00] <-> 101100010 

(2, 4, 1, 2) L = (4, 2, 2, 1) <-> [0000] [00] [00] [0] <-> 100010101 

A random composition of n is a random variable C n with values in the set of all compositions of n. We 
are interested in sequences of random compositions (C n ) which are consistent as n varies with respect to 
various reduction operations. In the balls-in-boxes description, let the places of the n balls be indexed 
from left to right by the set [n] :— {1, . . . , n}. Let (Y n ) be a sequence of random variables with Y n 6 [n] 
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for each n, with Y n independent of C n - Let C n be the composition of n — 1 obtained by deleting the ball 
in place Y n from the balls-in-boxes representation of C n . For instance, if 

C 9 = (2,4,1,2) <-» [00] [0000] [0] [00] <-> 101000110 

as above, and Yg = 6, then 

C 9 = (2,3,1,2) <-> [00] [000] [0] [00] «-> 10100110. 
Whereas if instead Yg = 7, then 

Cg = (2, 4, 2) <-> [00] [0000] [00] <-> 10100010. 
We say that the sequence of compositions (C„) is (Y n )- consistent if there is the equality in distribution 

C,,7 = C„_i for every n — 2, 3, . . . (2) 

where C~ is C n reduced by deletion of the ball in place Y n . Then, by Kolmogorov's extension theorem, 
the sequence (C n ) can be realised jointly with (Y n ) on a common probability space, so that the equality 
in also holds almost surely. We say that such a realisation of (C n ) is strong (Y n )- consistent. We are 
particularly concerned with the operations of uniform, left and right reduction corresponding to Y n with 
uniform distribution on [n], to Y n = 1, and to Y n = n. That is, removal of a ball picked uniformly at 
random, or the left-most ball, or the right-most ball. So we may call (C n ) uniform-, left- or right- consistent 
as the case may be. Note that (C n ) is left-consistent iff is right-consistent. 

There is an obvious bijection between sequences of distributions of C n which are right-consistent and 
probability distributions of infinite binary sequences (£i,£2, ■ ■ ■) with £i = 1: a strong right-consistent 
realisation of C n in binary notation is the truncation (£i,£2, ■ ■ ■ ,£n) of the infinite binary sequence. An 
alternate representation is obtained by replacing by the random set of positive integers {i > 1 : £j = 
1}. Thus right-consistent sequences of compositions may be identified with random subsets of positive 
integers which contain 1. 

A uniform-consistent sequence of random compositions (C n ) is also called a composition structure 
The corresponding sequence of random partitions is then a partition structure in the sense of 
Kingman (see ^H] for a survey and background). That is to say, 

(C±-) 1 = C X n _ x for every n = 2, 3, . . . . (3) 

where the left side is the decreasing rearrangement of a reduction of by removal of uniformly chosen 
random ball. Kingman gave a representation of partition structures which Gnedin refined as follows: 

Theorem 1 [H] Let (C n ) be a composition structure. Then there exists a random closed subset Z of 
[0,1] such that a strong uniform- consistent realisation of (C n ) can be constructed as follows: let (Ui) be 
a sequence of independent uniform [0, 1] variables, independent of Z , and let C n be the sequence of sizes 
of equivalence classes among U\, . . . , U n , listed left to right, as these points are classified by the random 
equivalence relation ~ induced by Ui ~ Uj for i ^ j iff Ui and Uj fall in the same interval component of 

Remark. For consistency with further considerations in this paper we include the point 1 in Z only if 1 
is not an isolated point in Z. Thus, if the rightmost interval of [0, 1] \ Z exists, it is semiopen. 

We are most interested in the case when Z is light, meaning that the Lebesgue measure of Z equals 
almost surely. The collection of component intervals of [0, 1)\Z then defines a random interval partition 
of [0, 1], that is a collection of open subintervals of [0, 1], the sum of whose lengths is 1. The collection 
of lengths of component intervals of [0, 1]\^, suitably indexed, is then a random discrete distribution as 
studied in JJj- We regard the composition structure (C n ) as a combinatorial representation of either Z or 
its associated interval partition, just as the partition structure (C*) may be regarded as a combinatorial 
representation of the unordered collection of interval lengths. 
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In a series of previous papers El 13 El EH EE EG El ED E] , we have studied the composition struc- 
tures, partition structures, random interval partitions, and random discrete distributions, corresponding 
to various random subsets of [0, 1] of particular interest. Here we tie together some threads from these pre- 
vious studies, to show how various analytic properties of the random subset Z of [0, 1], which are natural 
from the perspective of continuous parameter stochastic processes, correspond to various combinatorial 
properties of the associated composition structure (C„). 

A random closed subset S of R + is called self-similar (or scale-invariant) if 

S = cS for all c> 0. (4) 

In Section |21 we establish the following result, which generalizes a construction introduced in 114j in the 
case discussed in Example 2 below. 

Theorem 2 For a sequence of distributions of random compositions (C n ) the following two conditions 
are equivalent: 

• (C n ) is both uniform- consistent and right-consistent. 

• (C n ) can be derived by uniform sampling from S H [0, 1] for some self-similar random closed subset 
S o/R + . 

When these conditions hold, a strong right- consistent version of (C n ) can be constructed as follows: in- 
dependent of S , let e\ < €2 < ■ ■ ■ be the atoms of a homogeneous Poisson point process (henceforth PPP) 
on R_|_, and let the binary representation of C n be the first n digits of defined by £i = 1 and for j > 1 

& = %_i, £j -]ns^). 

Note that there are two quite different realisations of (C n ), one that is strong uniform-consistent, 
obtained by uniform sampling from S n [0, 1] as described in Theorem ^ and one that is strong right- 
consistent, obtained by Poisson sampling from S C [0, oo[ as described in Theorem El Obviously, it 
impossible to construct (C n ) to be simultaneously strong uniform-consistent and strong right-consistent. 

Example 1 0121111121 Let (£i,£2, ■ ■ ■) be a random Bernoulli string with independent digits and distri- 
bution 

P(0 = 1) = 1 - P(& = 0) = 9/U + 9-1), 

where < 9 < oo is a parameter. Let C n be encoded by the first n digits, so (C n ) is strong right-consistent 
by construction. It is elementary that C n is distributed according to the formula 

where (9) n = 9(9 + 1) • • • (6 + n — 1) and A.j = Ai + • • • + Xj. This is a variant of the Ewens sampling 
formula [HI El; which gives the distribution of the sizes of blocks, in reverse size-biased order, of a Ewens 
partition of n with parameter 9. In the case 9 — 1 the sequence results from encoding the cycle 
partition of a uniform random permutation of [n] by Feller coupling [2]. This sequence also appears 
in the theory of extremes as the sequence of record indicators of independent identically distributed 
observations with continuous distribution 0]. The uniform-consistency of (C n ) was observed in 5 . It is 
known that the random set Z in Kingman's representation is the restriction to [0, 1] of the self-similar 
random set S which is the union of {0} and the set of points of a scale-invariant Poisson process on [0, oo[, 
with intensity 9 dx/x , x > 0. Properties of this scale-invariant Poisson process are reviewed in [J. Two 
trivial composition structures appear as limiting cases for 9 J, and 6 T oo. 

Example 2 [14115] Let £ 3 - = l(R k = j for some k > 0) where R = 1 and R k = 1 + X\ H h X k is the 

discrete renewal process derived from independent and identically distributed Xj with 

F(X j =r) = (-iy- 1 (°^), r=L2,... 
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where < a < 1. The corresponding right-consistent sequence of compositions has distribution 



l 



(1 -ajxj-i 
A,! 



P(C„ = A) = Xe a"- 1 ]J 



(6) 



j=i 



That this sequence of compositions (C n ) is both right-consistent and uniform-consistent was shown in 
|14j , where the two different strong consistent constructions of Theorems and ^ were given for S the 
self-similar zero set of a Bessel process of dimension 2 — 2a, and Z — S H [0,1]. The trivial cases appear 
again as limits for a = or 1. 

It was shown by J. Young \F3\ that no other choice of distribution either for a sequence of independent 
Bernoulli variables (as in Example 1), or for a renewal sequence with independent spacing between l's 
(as in Example 2) yields a right-consistent sequence of compositions (C„) such that (C n ) is a partition 
structure. As the latter condition is weaker than uniform-consistency of (C n ), no more right-consistent 
composition structures can be obtained from these constructions. However, we will show that a con- 
struction adopted from ^| H! allows an interesting extrapolation of the above examples to obtain a 
right-consistent composition structure with two parameters (a, 9) with < a < 1 and 9 > —a, corre- 
sponding to the two-parameter Ewens-Pitman family of partition structures. 

To emphasise the general correspondence between composition structures and random sets provided 
by Theorem ^ we use terminology for composition structures to reflect properties of their associated 
random sets. So we prefer the term self-similar rather than right-consistent for the composition structure 
(C n ) obtained by uniform sampling from SO [0, 1] for a self-similar random set S. In [S] we described the 
random sets associated with composition structures (C n ) with the following left-regenerative property: 
for every n and 1 < x < n, conditionally given the leftmost part of C n is x, the remaining composition of 
n — x is a distributional copy of C n - X - Here we find it more convenient to work with the right-regenerative 
property, defined in the same way with the rightmost part instead of the leftmost part. Evidently, (C n ) 
derived by uniform sampling from Z is right-regenerative iff (C^~) derived by uniform sampling from 1 — Z 
is left-regenerative. So the main result of [Sj can be restated as follows: a composition structure (C„) is 
right-regenerative iff (C n ) is derived by uniform sampling from e~ w for W a regenerative random subset of 
[0, oo [ . Here we distinguish a class of Markov composition structures such that the binary representation 
of C n has l's at the places visited by a decreasing Markov chain on [n] with some transition matrix q 
which does not depend on n, and some initial distribution q*(n, •) on [n\. These turn out to be derived 
by uniform sampling from e~ w for W a delayed regenerative random subset of [0, oo[ . In the special case 
when W is a stationary regenerative set, e~ w is the restriction to [0, 1] of a self-similar random subset 
of [0, oo[ . The self-similar Markov compositions so obtained turn out to be those whose infinite binary 
representation has l's at the places visited by an increasing Markov chain on N. Finally, extending our 
study in , we introduce self-similar Markov composition structures associated with the two-parameter 
Ewens-Pitman family of partition structures. 

2 Self-similar composition structures 
2.1 Composition probability function 

The distribution of a random composition C n of integer n is a nonnegative function 



on compositions A of n which satisfies J2\-\\\= n PW ~ 1- Here and henceforth |A| denotes the sum of parts 
of a composition A. For a general sequence of random compositions (C n ) these marginal distributions 
are described by a composition probability function (CPF) defined for all compositions of integers. A 
sequence of compositions is (Y n )-consistent iff the CPF satisfies a linear recurrence of the form 



p(X) = P(C n = A) 





(7) 
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where x(fj,, A) for fi with |/x| = n is a matrix describing the transition probabilies from compositions of n 
to compositions of n — 1 determined in the balls-in-boxes representation by removal of a ball from place 
Y n . See [SI El for details in the case of uniform-consistency when Y n has uniform distribution on [n]. For 
(C n ) that is right-consistent, the recurrence is just linear relation 

p(A x , . . . , A<) = p(Ai, ...,Xt+ 1) + p(Ai, ...,Xi, !)■ (8) 



2.2 Proof of Theorem 2 

We start by remarking that the distribution of a self-similar set S is uniquely determined by the distri- 
bution of its restriction Z to [0, 1], which satisfies the condition equivalent to @: 

c(Sn[0,l])=Sn[0,c], for 0<c<l. (9) 

This follows from the known fact that the distribution of a stationary set W C R (invariant under shifts) 
is determined by the distribution of W fl K+ , and we can transform a self-similar S into a stationary set 
W:=-logS. 

For n = 1,2,... let Z„ C [0, 1[ be a finite set encoding C n via the correspondence (Ai, . . . , Xe) — » 
{0, Ai/n, . . . , A^_i/n} for A 3 - = Ai + . . . + Aj. Assuming now that we are working with a strong uniform- 
consistent realisation of (C„), by the law of large numbers [B] the Hausdorff distance between Z n U {1} 
and ZU{1} goes to with probability 1. The Hausdorff distance between Z n and Z also goes to 0. This 
can be shown by considering the last block of the composition, which has a positive frequency if and only 
if 1 is not an accumulation point for Z. In this sense, Z n — > Z a.s., hence also Z^ nx \ ~* Z a - s - f° r every 
x G ]0, 1[ . Translating the truncation property in terms of Z„s we obtain 



d [nx\ 



Z \nx\ ■ 



For n — > oo the left side converges to ZD [0, x\ a.s., while the right side converges to x Z a.s., hence the lim- 
its must have the same distribution. This means that Z is self-similar, by JHJ). The strong right-consistent 
represention is obtained by noting that the scaling (S, e\, . . . , e„) —> (S/e n +i, ei/e„+i, . . . , e„/e„ + i) trans- 
forms 5 to a copy of itself (by self-similarity) and maps the first n Poisson points to the increasing 
sequence of n uniform order statistics. □ 



2.3 Some definitions 

We call Z heavy if Z has positive Lebesgue measure with nonzero probability, and we call Z light otherwise. 
The set Z can be discrete (as in Example 1) or perfect (as in Example 2) or neither discrete nor perfect. 
For x £ K+ introduce 

G x :=sup(Zn[0,x]), A x := x - G x , D x := inf(Zn]z, oo[). 

The age process (A x , x > 0) uniquely determines Z . In the event x € Z we have G x = x and A x = 0, 
while in the event x ^ Z the point x is covered by an open gap ]G X , D x [c \ Z. The interval ]G±, 1[ 
of length A\ is called the meander. If Z is heavy the meander may be empty with positive probability 
(then A\ = 0), while for light Z the meander is nondegenerate (and A\ > a.s.). 



3 Block counts, meander and the tagged interval 
3.1 The structural distribution 

For Z C [0, 1] a random closed set and U a uniform random point independent of Z let V be the size of 
gap in Z covering U in case U £ [0, 1] \ Z, and let V = in case U G Z. The gap covering a random 
point is sometimes called the tagged interval. 



5 



Let (Vj) be the decreasing sequence of lengths of gaps comprising [0, 1] \ Z, so that J2j Vj — 1 an d 
1 — J^j Vj i s the Lebesgue measure of Z. If all the V^-'s are pairwise distinct with probability one, then 

V(V = Vj \ Vi,V 2 ,...) = Vj, ¥{V = 0\V 1 ,V 2 ,...) = 1-J2 Vj 

j 

So V may be called a size-biased pick from the sequence of lengths. 

Suppose now that the composition structure (C„) is derived by uniform sampling from Z. The 
distribution of V is called the structural distribution (of (C„) , or of the associated partitition structure, 
or of the associated random discrete distribution of interval lengths). Recall that p denotes the CPF of 
(C„). Observe that 

pi^^EV 11 - 1 

because C n equals the one-part composition (n) when n — 1 further uniform points hit the interval of 
length V found by U. Other relations of this type are 



Hn >r := EK n , r = ME(r-'(l - V)"- r ) , 1 < r < n. 



(10) 



p n := EK n = ^2n ntr =Kl- — v y ' ' 1(F>0) ) +nP(V = 0) (11) 

where K n ^ r is the number of parts of C n or size r, and K n = K n>r is the number of parts of C n . Note 
that (K n ^ r , 1 < r < n) is a standard encoding of C^, the random partition of n induced by C n . 

Theorem 3 |17j Suppose that S C M+ is self-similar, and let Z := S [0, 1]. Let A\ be the length of 
the meander interval of Z , that is the rightmost gap in [0, 1]\Z, with A\ = if 1 G Z, and let V be the 
length of the component interval of [0, 1]\Z which contains U independent of Z , with V — if U G Z . 
Then A\ has the same distribution as V . 

Proof. Let p„ be the rightmost in the sample of n uniform points. Given A Pn , with probability 
(A Pn /p n ) n ~ 1 the remaining n — 1 sample points fall in the same gap of Z as p n . By self-similarity, 

A Pn /p n = A\. So the probability that all n points fall in the same gap is 

p(n) =E(A w /w) n ~ 1 = EA™- 1 . 

Comparing with pin) — EV"" 1 we arrive at the conclusion, since a probability distribution on [0, 1] is 
determined by its moments. □ 



(1 - V) n 



Note that the event (A\ = V) has probability EA\ — EV — p(2). Pitman and Yor ^7] went further to 
distinguish a strong sampling property for the meander 

P(Ai = V k | Vi, V 2 ,...) = V k , k = 1, 2, . . . (12) 



meaning the condition that the meander length is a size-biased pick from all lengths. This property holds 
in some cases (e.g. for Z in Examples 1 and 2, and in the setup of Theorem 1 1 41 below ) but does not hold 
in general. 



3.2 The last part and the tagged part of composition 

Theorem [3] implies that for self-similar composition structure, as n — > oo, the frequency of the last block 
of C n has approximately the same distribution as the frequency of the block selected by a size-biased pick. 
A stronger fact is true: a similar identity holds for each n, and not only asymptotically. This was already 
observed in |14l Proposition 11 (i)] in the case of renewal strings in Example 2. Intuitively, since both 
uniform- and right- reduction transform C n into a composition with the same distribution, it is natural 
to expect that the sizes of reduced parts have the same distribution. 



(i 



Theorem 4 For a composition structure (C n ), let P n denote the size of a random part of C n which 
given C n is selected with probability proportional to size. Let L n be the size of the last part of C n . If (C n ) 

is right- consistent then P n = L n for all n. 

This follows immediately from the following Lemma. 

Lemma 5 Let C n and C n _i be two random compositions of n and n — 1 respectively, defined on a 
common probability space in such a way that C„_i is obtained from C n by removal of a single ball in the 
balls-in-boxes representation. Let (w„ ir , 1 < r < n) be the distribution of the number of balls in the same 
box of C n as the ball removed, and let fj,„ ir and fi n -i_ r be the expected numbers of boxes containing r balls 
for C n and C n -\, respectively, as above in (I10|) . Then the distribution (w„ jr , 1 < r < n) is determined by 
the distributions of the partitions generated by C„_i and C n according to the formulas LO n ^ n = /i„. n and 

Un,r — ^>n,r+l = Hn,r ~ Mn-l,r (1 < T < U — 1). 

Proof. Follow the evolution of K n:T as n varies. This variable increases by 1 when a ball is chosen in a 
box of r + 1 balls (which is impossible for r = n), and decreases by 1 when a ball is chosen in a box of r 
balls. The probabilities of these events are uj nyr+ i and u> n , r , respectively. In all other cases the sampling 
does not affect K n:T . The formula for expected increments follows. □ 



Note that Theorem [3] follows from Theorem 0] by the law of large numbers. A discrete analogue of (|12fl 
holds for compositions in Examples 1 and 2: conditionally given C^, the last part L n of C n is a size-biased 
pick from all parts. 



3.3 A characterisation of structural distributions 

The following characterisation of structural distributions is a minor extension of |171 Condition 1] to 
include the heavy case. 

Theorem 6 The structural distribution of the interval partition derived from a self-similar random set 
S has the form 

¥(V€dx) = 1 -dx + ^—S (dx) (13) 

v ; (d + m) (1 -a;) d + m v ; v ' 



where v is a measure on ]0, 1] satisfying 



1 

| log(l — x)\v(&x) < oo 







and d is a nonnegative constant. Thus, the structural distribution may have an atom at 0, and otherwise 
has a density 4>{x), < X < 1, such that (1 — x)(f>(x) is decreasing. The data (d, v) are determined 
uniquely up to a positive factor. 

Proof. Assume the normalisation m = 1 . Let W — — log S be stationary and A be a random variable 
whose distribution coincides with the conditional distribution of the size of the gap of W covering given 
this size is positive. By the ergodic theorem, the part of the gap on the positive halnine is distributed 
like XU, with U uniform [0, 1] independent of X. The conditional distribution of A\ given A\ > is 
then the same as for e~ xu , which implies along the lines of the argument in |171 Section 4] that A\ has 
a density written as v[x, 1]/(1 — x). The unconditional distribution in the form follows by defining 
d from 

F(Ai = 0) = P(0 G W) = - — - 
(where the middle term is the the long-run Lebesgue measure of W per unit length). □ 



The structural distribution also accounts for some functionals of self-similar composition structures 
which involve the ordering of parts. For a self-similar composition structure (C n ) with binary represen- 
tation • • •)) define the potential function 
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In terms of balls-in-boxes, this is the probability, for each n> j, that the jth ball of C n falls in a different 
box from its predecessor. Note that for a composition structure which was not right-consistent, the 
analogous quantity would typically depend on n as well as j. In terms of \i n := EK n and the moments 
of the structural distribution, we read from that 

a 

g(j) = N - = E (1 - V)^ 1 , Mn = E 9(j) 

j'=i 

(where /io = 0). 

Nacu proved that the probability law of the general exchangeable partition of N is uniquely determined 
by the distribution of the sequence (Ij ) of indicators of minimal elements of the blocks (so I7 = 1 means 
that 7 is the minimal element in some block). In terms of Kingman's representation, Ij = 1 each time 
Uj discovers a new gap in Z or hits Z . For the Ewens composition structure of Example 1, (Ij) has the 
same distribution as (£j). For a general self-similar composition structure, the sequences are differently 
distributed (as e.g. in Example 2), but the right-consistency of C n implies that 

F(Ij = 1) = P(0 = 1) = g(j) 

because 

h + ••• + /„ = & + • •• + &» = K n , 

the number of parts of C n . 

3.4 A fragmentation product 

The following operation on self-similar sets generalises the one found in [51 1171 ITS] . Let Z C K+ be self- 
similar and independent of Z. Let (Mj) be independent copies of the same random closed set M C [0, 1]. 
For each gap in Z with left-point Zj G Z and size Sj fit the set Zj 4- SjMj in this gap, and take the 
union of Z and all these scaled shifted copies of M . Then the result Z®M (read Z fragmented by M) is 
easily shown to be self-similar. For example, when M = {1/2} the set Z <E> M is obtained by adding the 
midpoint for each gap in Z . 

The operation has an analogue in terms of composition structures (as in |19|L For two composition 
structures (C„) and (C' n ), for each n, break the generic part of C n , say r, into smaller parts according 
to an independent copy of C' r . The resulting sequence of compositions is a right-consistent composition 
structure provided (C n ) is so, and the corresponding self-similar random set is the fragmentation product 
Z ® M of the sets in Kingman's representation of (C n ) and (C' n ). 

4 Markovian composition structures 
4.1 Decrement matrices 

The following extension of the concept of a regenerative composition structure introduced in [5] extends 
our study in that paper and prepares for the results in the next section. 

Definition 7 A composition structure (C n ) is called Markovian if for some infinite transition proba- 
bility matrices 

(q(n : m), 1 < m < n < 00) and (q*(n : m), 1 < m < n < 00) 
the distribution of each C n is given by the product formula 

l-\ 

p(\)=q*(n:\ i )Y[q(A k : \ k ), (14) 
fc=l 

where A = (A 1; . . . , A^) is a composition of n, and A/. = Ai + . . . + A& for k < I. 
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Formula ljl4JI has the following interpretation. Imagine a decreasing time-homogeneous Markov chain 
Qn = (Qn t> * = 0> 1) • • •) w hh state-space {1, 2, . . . , n} and terminal absorbing state 1. The chain has 
initial distribution 

p (QLa = 3) = Q*{n -n-j + 1), j = 1, . . . ,n 

and it jumps from state j (2 < j < n) to i (1 < i < j) with probability q(j — 1 : j — i). We call q* and q 
decrement matrices. In these terms, a random composition of n can be identified with a path of Q)^, *- e - 
the binary representation of C n (for fixed n) is obtained by writing l's in the positions visited by Q^. 

In the case q = the formula (|14fl defines a regenerative composition structure, as introduced in |S] . 
As mentioned in the Introduction, to fit in the present framework, the convention in that paper regarding 
the ordering of blocks should be reversed. 

Lemma 8 For a Markovian composition structure we have 

r + 1 . n + 1 — r 1 

q(n : r) = q(n + 1 : r + 1) H — q(n + 1 : r) H q(n +1:1) q(n : r) (15) 

n+1 n + 1 n+1 

r + 1 72+1 — r I 
q«(n : r) = qJn + 1 : r + 1) H — qJn + 1 : r) H 1 : 1) q(n : r). (16) 

71+1 71+1 71+1 

Conversely, if two nonnegative matrices q* and q satisfy these recursions and q<,{l : 1) = q(l : 1) = 1 
then they define a Markovian composition structure via (I14|l . 

Proof. The recursions follow from (|14(l and uniform consistency. When a composition is reduced by 
sampling the last block either remains unaltered or, in the case the last block is a singleton and gets 
deleted, coincides with the second-last block, whence (|16|l . 

The first recursion is familiar from llUj , but proving it under the more general assumption (|14fl 
requires more algebra. For (a, 6, c) a composition of n use uniform consistency to obtain 

p(a, b, c) = — |— -p(l, a, b, c) + ^-^-p(a + 1, 6, c) H ]~rP{a, 1, b, c) 

n+1 n+1 n+1 

+^^-p(a, b + 1, c) H ^— rP(«, 6, 1, c) + ^ip(a, 6, c + 1) H 7-7^(0, b, c, 1) . 

n+1 n+1 n+1 n+1 

Group the first three terms in the right side as 

a ~*~ -, Q*( n + 1 : c)q(n + 1 - c : b)p(a) 
n+1 

and factor all other terms using (|14fl . Factor the left side as 

p(a, b, c) — q*(n : c)q(a + b : b)p(a) 

and express g*(n : c) through q(n + 1 : • ) using i|16[l . Cancelling common terms and factors yields (|15fl . 
The converse is checked as in Proposition 3.3]. □ 

4.2 Kingman's representation 

Let (Yt, t > 0) be a subordinator (with lo = 0), meaning an increasing Levy process. Let < X < 00 be 
a random variable, independent of (Y t ) and satisfying P(X < 00) > 0. We call the process (X + Y t , t > 0) 
a delayed subordinator, and call its closed range W a delayed regenerative set. The distribution of 
determines that of X (because X = min W) and determines the Levy parameters [y, d) up to a positive 
factor (since given X < 00 the set W — X is regenerative) . Introduce the Levy-Khintchinc exponent 



$(s) = ds+ (1 
^0 



its two-parameter extension 



$(n : to) = Q ^(-l) J+1 ^$(n-m + j), 1 < to < n, (18) 
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and the moments 

: m) = ^E((A™(1 - AO"-" 1 ) , < m < n (19) 

where A\ = 1 — exp(— X). 

Theorem 9 A composition structure (C n ) is Markovian if and only if it can be derived by uniform 
sampling from Z = exp(— W), with W being a delayed regenerative set. Explicitly, the distribution of 
(Cn) is given by the product formula with decrement matrices 

$(n : to) 

q (n:m) = -L^ (20) 
q*(n:m) = ^ (n : 0)q(n : m) + ^ (n : m) . (21) 

Proof. The argument for the 'if part follows the same line as in Theorem 5.2 (i)]. For the 'only if part 
let C n be derived by uniform sampling from the random closed set Z C [0, 1]. Assume first that G\ < 1 
a.s. for G± = sup Z n [0, 1[ . Let L n be the last part of C n . Given n — L n = m let C' m be a composition 
of to obtained by deleting the last part of C n . Note that this definition does not depend on n > m and 
that by Lemma[S]and 9, Proposition 3.3], hence (C' m ) is a regenerative composition structure. 

Let Z' m be discrete random sets encoding C' m , as in the proof of Theorem [21 but with 1 appended to 
Z' m . The set Z n (not containing 1) encoding C n can be represented as 

Z'r 



where L n and (Z' m ) are independent. By the law of large numbers _& Z n converge to Z, while by 
Theorem 5.2 (ii)] Z' m converge to some set Z' — exp(— W) with regenerative W C [0, oo]. As n — > oo 
the law of the large numbers ensures that 1 — L n /n — > G\ a.s., hence in the limit we have 

Z = Gt Z' 

where G\ and the set Z' are independent. Hence the set W — — logZ is delayed regenerative. 

The case P(Gi = 1) > is treated similarly. This can be viewed as a mixture (over q*) of the trivial 
one-block composition structure and another Markovian one. □ 



5 Self-similar Markov composition structures 

5.1 Markov sequences 

Let = (qJ , f = 0, 1, . . .) be a time-homogeneous increasing Markov chain with the state-space {1,2,...} 
and the initial state Qq — 1. Define a string ^1^2 ■■ ■ by identifying the positions of l's with the sequence 
of sites visited by QT; 

£j = l(Qt = j for some t). 

This defines a right-consistent sequence of compositions (C„), so that each C n encodes path of killed 
before crossing level n. We will consider such compositions which are also uniform-consistent, in which 
case (in view of Theorem |2Jl we will call (C n ) a self-similar Markov composition structure. 

Bernoulli sequences in Example 1 yield self-similar Markov composition structures. Another instance 
is the renewal sequence in Example 2, with a discrete renewal process. 

As the terminology is meant to suggest, self-similar Markov composition structures are Markov in the 
sense of Section^ To see this, for each n consider a Markov chain Q^ n with state-space [n] U {00}, such 
that Qjj coincides with as long as the latter stays in [n], but jumps to 00 at the time when exits [nj. 
Let be a time- reversal of Q n , so that Q„ has the same distribution as the value of Q^ n immediately 
before exiting [n]. The chains , n = 1,2,... are coherent in the sense that, for to < n, when 
enters [to] its state has the same distribution as . Conversely, if the chains (Q n ) are coherent, their 
reversals (Q n ) can be organised in a single 'super-chain' with state-space {1, 2, . . .}. So this property 
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distinguishes the self-similar Markov case within the general Markov case. Another feature characterising 
the coherent sequence is that there is a common potential function: for all n > m the probability that 
visits state m does not depend on n. 

We recall that a stationary regenerative set |18j is the range of a process (X + Y t , t > 0) where (Y t ) 
is a subordinator with Levy measure satisfying 

>o 

yv(dy)<oo, (22) 



o 

d > is some drift coefficient, X is independent of (Y t ) and has distribution 

P(Xedy)='^^dy + -^5o(dy). (23) 
d + m d + m 

Thus, (X + Yt) is a delayed subordinator with a special choice of distribution for X, to make the range 
stationary. 

Theorem 10 A composition structure C is self- similar Markov if and only if its associated self-similar 
set Z can be presented as Z = exp(— W) where W is a regenerative set with stationary delay. 

Proof. Follows by combining Theorems and □ 



We see that self-similarity of Markov composition structures imposes further constraints on the decre- 
ment matrices q and in the product formula (|14|l . Thus, q can be associated only with a finite-mean 
subordinator, and is given then by (|20|) with $ as in l|17fl and <|18fl . Similarly, is given by (|21|) for \& 
as in l|19[) . and A\ having distribution 

w a i \ v\x, l]dx d 

F(A 1 e dx) = , + ——So(6x), 

(d + m)(l — x) d + m 

where v is the image of v under i/n 1- e~ y . By Theorem lathis is also the structural distribution of Z, 
and comparing with (JT3J we observe that the distribution is of exactly the same type as for the general 
self-similar Z according to Theorem [BJ For the potential function there is a simple formula 

1 $(7-1) 

9(3) = d + m jZi < for 3 > 1 , .9(1) = 1 (24) 
which appeared in 7, p. 86] in a special case, and the transition probabilities / of are recovered from 

g(j) 

The relation between a regenerative composition with decrement matrix q and the associated self- 
similar Markov composition structure with matrices q and g* (with q* given by l|2ipl is the combinatorial 
counterpart of the relation between a subordinator and its stationary version. 



5.2 Arrangements 

A difficult and interesting question is the relation between partition structures and their possible arrange- 
ments as composition structures with certain properties. Some aspects of this problem were treated in 
[HI EI ■ Although we do not know a simple algorithm to check if the blocks of a given partition may be 
ordered to produce a self-similar Markov composition structure, we can show the uniqueness. 

Proposition 11 If a partition structure admits an arrangement as a self-similar Markov composition 
structure, then such arrangement is unique in distribution. 

The claim follows from the next lemma by recalling that the moments of the structural distribution p(n) 
are determined by the associated partition structure. 
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Lemma 12 For (C n ) a self-similar Markov composition structure, for each n. the distribution of C n 
is uniquely determined by the structural moments p(l) = 1, p(2) , . . . , p(n + 1). 

Proof. A binomial expansion in (|10|l shows that \i n ^ r , 1 < r < n, are computable from p(l), . . . ,p(n). 
By Theorem 0| and the formula q*(n : r) — r fi n-r /n also (q*(n' : r), 1 < r < n' < n + 1) is computable 
from f>(l), . • • ,f>(ft + 1)- Applying l|16|) we see by induction that the minor of the decrement matrix 
(q(n' : r), 1 < r < n' < n) is computable from . . . ,p(n + 1), which taken together with l|14|l proves 
the claim. □ 



For the regenerative case (when q = g») we have shown that only the moments p(2), . . . ,p(n) are needed 
to recover the distribution of the composition of order n Proposition 7.1]. The explicit formulas are 
rather involved already in that case. 

While a self-similar Markov arrangement (if any) of a partition structure is unique, many self-similar 
composition structures may project onto the same partition structure. For example, for self-similar Z, 
M C [0, 1] and M 1 the reflection of M about 1/2, both fragmentation products Z®M and Z®M' induce 

the same partition structure, but the composition structures are different, unless M = M', This implies 
nonuniqueness in the problem of binary representability of partition structures studied in |19| . 



6 The two-parameter family 

We are interested in self-similar composition structures associated with the members of the two-parameter 
family of partition structures ^S]. For the range of parameters 9 > —a, < a < 1 these partition 
structures may be introduced as follows. 

Let (Vi) or (Vi,i € I) denote a random discrete distribution, that is a collection of random variables 
indexed by i in some finite or countably infinite set /, with 

Vi > and Vi = 1 almost surely . 

i 

We use {Vi} as an informal notation for multi-set of all non-zero values of Vi, without regard to how they 
are indexed by /. Formally, {Vi} is encoded by the sequence (Vj,j = 1,2,...) := RANK(V^) meaning 
that (Vj,j = 1,2, . . .) is the decreasing rearrangement of (Vi) with padding by zeros if necessary. Let us 
write simply 

{V}~(a,9) (25) 

if RANK (14) has the Poisson-Dirichlet distribution with two parameters (a, 9), defined following |16II15| 
as the distribution of RANK (V.) where 

Vl := W 1 (26) 
has beta(l — a, a + 9) distribution and for i > 1 

V:= (l-W 1 )---(l-W i - 1 )W i 

where Wi has beta(l — a, a + iff) distribution, and the Wi are independent. It is known ^1 that if 
{Vi} ~ (ct,9) then such Vi can be constructed by size-biased random permutation of {Vi}. Then V\ = Vj 
for a random index J with 

v(J = J I {Vi}) = v, 

while 

m # } : =(-^^- i^4~(a,a + d) (27) 



and 



1-Vn 



{Vjf} is independent of V\. 



Since {Vi} can be measurably recovered from V\ and {Vj^} as 

{V i } = {V 1 U{l-Vi)V*} (28) 

an immediate consequence is 
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Lemma 13 ^3 Proposition 35] If V\ has beta(l - a,a + 9) distribution and {V*} ~ (a, 9 + a), and 
{Vi} is defined by (|28[1 . then (Vi) ~ (a, 6) and V\ is a size-biased pick from (Vi). 

In [H] we established that for < a < 1, 9 > a random discrete distribution {Vi} ~ (a, 9) can be 
derived from the interval partition of [0, 1] associated with a unique regenerative composition structure. 
Specifically, the image of the Levy measure of this (a, 9) regenerative composition structure under y 
1 — e~ v is the measure v on ]0, 1] characterised by 

v[x,l] =x- a {l ~x) e (29) 

and the decrement matrix is 

I \ ( n \ (l-a)r-i (n-r)a + r9 

q(n ■ r) = [rj (9 + n-r) r n ' (30) 

By combining these known results we now obtain the following: 

Theorem 14 For < a < 1, 9 > let Z — exp(— W) where W is the stationary version of the 
regenerative set associated as above with an (a, 9) regenerative composition structure. Then Z is a self- 
similar Markov random set associated with an (a, 9 — a) partition structure. The structural distribution 
of Z is beta(\ — a, 9), and Z has the strong sampling property <|12[l . 

Proof. The structural distribution of Z is read from (|29(l . I|13|) and i|23|) . The construction of Z allows 
the application of Lemma fHfl with 9 replaced by 9 — a, to deduce the other conclusions. □ 

Theorem El can also be derived more combinatorially as follows. Consider the Polya-Eggenberger 
distributions 

fn - 1\ (9 + a) n - r (l - a) r -i 
q a An:r) = ^_ 1 j , r = l > ... ) n 

and define a function on compositions 

i 

7fa,e(A) = Y[ 1afi+(i-k) a {^k ■ Afc) , A = (Ai , . . . , A*) (31) 

fc=i 

where Aj, = Ai + . . . + A&. The formula 131(1 is the distribution of the (a, 9) partition structure with parts 
arranged from right to left in a size-biased order. The (a, #)-partition structure is defined then by the 
partition probability function obtained by the symmetrisation of the CPF (see ^Oj for more details of 
this procedure): 

Tr a ,e(\ l )= ^,e(X a ) (32) 

distinct a 

where the summation extends over all distinct permutations X a = (A CT (x) > • • • j \r(i)) °f parts of composition 
A. From l|31|) and IMl't follows the recursion 

■n a ,e(\ l ) = ^ Qa,e(n : Xj) ir a ,e+ a (X l - Xj) , (33) 

distinct AjGA^ 

where A^ is a (ranked, unordered) partition of n and where A^ — Xj is the partition A^ without part 
Xj. Let C n denote the self-similar Markov composition structure derived from S = exp(— W) as in the 
theorem. Computing beta integrals to determine the distribution g„ of the last part of C n according to 
(|21|l we obtain 

q*(n:r) = ( n ^\EA[(l - Ai) n ~ r + E (1 - A\) n q{n : r) 



. fl — a), — i (n — r)a + at 

B(r-l + a,n-r + 9) + B(l-a,n' L ' 



B(l-a,9) V ' ' '{9 + n-r) r 
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which upon simplification shows that q* = q a .g- a . If the last part of C n is r then the rest of C n must be a 
copy of the (a, 9) regenerative composition of n — r, hence the partition structure can be recovered from 

7r (A i ) = q*(n : Xj) TT a ,e{X l - Xj) 

distinct AjGA^ 

which by comparison with (|33|) shows that 7r = n a ,6-aj m accordance with the conclusion of the theorem. 

Corollary 15 For < a < 1, 8 > —a each (a, 9) partition structure has a distributionally unique 
arrangement as a self-similar Markov composition structure. 

There is an explicit stochastic algorithm which allows, for each n, arranging an unordered collection 
of parts of a (a, 9) partition into a Markovian self-similar composition. Given a partition A choose a part 
Xj by a size-biased pick and declare it to the right end of the composition under construction. Then 
arrange the rest parts A — Xj one-by-one, as for the regenerative (a, 9 + a) composition (from right to 
left), using the appropriate deletion kernel Specifically, when the rest partition is jj,, the algorithm 

selects each part of size r of /i with probability 

1 — r)r + r(l — t) 
n 1 — t + (k — l)r 

where r = a /{2a + 9) and k is the number of parts of /x; then the same procedure is applied to the 
reduced partition, etc. For example, consider the (a, 0) partition of n, assumung it has i parts, after 
placing a size-biased pick the rest I — 1 parts should be arranged in a random order, with all {I — 1)! 
orders being equally likely. 

By the very construction, conditionally given the parts, the last part is a size-biased pick from all 
parts: this feature is a combinatorial analogue of the strong sampling property in Section[3](as was stated 
for the (a, 0) case in ^] Proposition 11 (i)]). Algebraically, the combinatorial strong sampling property 
amounts to the identity 

q a ,e(n : Xj) 7r a;a+ g(A i - Xj) _ Aj 
Y,i 1a : e(n : Aj) 7r Q:Q+e (AJ- - Aj) n 

6.1 Case (a = 0, 9 > 0) 

This is case of Example 1, with independent digits and potential function 

g{j) = J+9~T' 

Here S is PPP(9dy/y). A characteristic feature is that it is the only self-similar Markov compostion struc- 
ture which is right-regenerative. Indeed, if a random set is both regenerative and stationary regenerative, 
it is a homogeneous PPP. 



The transition function for the chain is 

0(7-2)1(0), 



fU\i) 



(<-l)!Wj 



Remark on records. The case 8 = 1 has classical interpretation in terms of indicators of records in 
a sequence of i.i.d. random variables with some continuous distribution. With reference to a question 
left open in 0] p. 297], a similar interpretation exists for any 9 > 0, but distributions of independent 
variables should be different. One possibility, based on a planar homogeneous Poisson process is the 
following: divide the positive quadrant Ml into vertical strips of widths (3j = (6)j-x/(j — 1)! and define 
the variables to be the heights of the lowest Poisson atoms in the strips, from left to right. Elementary 
algebra shows that, to agree with ESF(0), the collection of /3/s must be as above up to a common positive 
factor. The same distribution of record indicators appears for an independent sample from distributions 
, F & , . . . where F is an arbitrary continuous distribution on M. 
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6.2 Case (0 < a < 1, 9 = 0) 



The range S of an a-stable subordinator induces the renewal composition structure of Example 2. This 
is the self-similar version of the regenerative (a, a) composition structure, whose Levy measure after the 
transform x = 1 — e~ v is v a ,a defined by 



(where m a a is the mean value as in JHJ)). 

The induced composition structure is self-similar Markov as well as left- regenerative. Thus S<1 [0, 1] = 
1 — e~ R for R the range of another, killed, subordinator, as detailed in J!J|. The combination of the two 
regeneration properties is characteristic: 

Proposition 16 If a composition structure (C n ) is both Markov self-similar and left regenerative, then 
(C n ) is the (a, 0) composition structure derived by sampling from the range of some a-stable subordinator. 

Proof. Let Z be the set in Kingman's representation of (C„). The left regeneration property implies that 
Z is the range of a multiplicative subordinator 1 — e~ A , where A is some subordinator. On the other 
hand, by Theorem 1 1(JI Z = e~ B for B some stationary delayed subordinator, hence Z has a nontrivial 
meander with positive probability, which implies that A has a positive killing rate. Let Z be the set 
Z conditioned on zero meander, which is the range of the multiplicative subordinator 1 — e~ A ° , for Aq 
the version of A without killing. Then, of course, Zq = e~ B " for Bq the version of B but with zero 
delay. It follows that the composition structure induced by Zq is both left- and right-regenerative, that 
is both sets Zq and 1 — Zq are multiplicatively regenerative. By Theorem 12.1 and Corollary 12.2], 

Zq = 1 — Zq , Zq is the zero set of a Bessel bridge, and the composition structure induced by Zq is of type 
(a, a). By Theorem ll4l the stationary version of this composition structure is of type (a, 0), and Z is the 
range (restricted to [0, 1]) of some a-stable subordinator. □ 

Remark. This result complements the characterisation of (a, a) regenerative composition structures in 
Theorem 12.1]. Apparently, the assumption of the Markov property can be omitted, i.e. it seems 
sufficient to assume only that (C„) is right-consistent. That the Markov property follows is not obvious, 
because the left-regeneration property of (C n ) does not imply the right Markov property of the composition 
in the sense of Definition (which requires time-homogeneity of the Markov chain). Still, a plausible 
argument is the following. As above, define left-regenerative (multiplicatively) Zq by conditioning on zero 
meander (a limiting procedure required to justify this definition is obvious). Fix x e]0, 1[ and condition 
on x G Zq, then, because Z is self-similar, [0, x]C\Zq = x Zq. But by the left-regeneration (multiplicative) 
property, [0,x] !~\Z is independent of [x, l]nZ , whence the right-regeneration (multiplicative) property. 
Then the conclusion is above. A loose point in this argument is the conditioning on the zero event x G Z Q 

6.3 Case (a, a) 

For this partition there is a regenerative arrangement (the composition structure induced by the Bessel 
bridge) and another self-similar Markov arrangement. The latter is the self-similar version of the regen- 
erative (a, 2a) composition. 

6.4 General < a < 1, 9 > — a 

Explicit construction of the self-similar Markov composition structure associated with the (a, 9) partition 
structure exploits the fragmentation product introduced in Section IPl One ingredient is the Poisson 
process Z PPP(6»dy/y), 6 > 0, restricted to [0,1]. Another factor is the set M' = 1 - M Ci [0,1] 
obtained by reflecting the range M of the a-stable subordinator. The self-similar set is defined then as 



^a,a 1] 



x- a (l -x) 



hence the potential function is 



9(3) 



m a , a (i-l) (.7-1)!' 
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the fragmentation product Z ® M 1 , and the induced composition is the self-similar Markov version of 
partition (a, 9 — a). Conditioning on zero meander will produce a set corresponding to (a, 9) regenerative 
composition, as in 

Unlike M , the set M' exploited here has the leftmost meander interval. The fragmentation product 
Z ®M was introduced in |17|: the resulting composition structure is right-consistent but not Markovian. 
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